Skip to content

Benchmarks Website Version 3#7643

Draft
connortsui20 wants to merge 11 commits intodevelopfrom
ct/benchmarks-v3
Draft

Benchmarks Website Version 3#7643
connortsui20 wants to merge 11 commits intodevelopfrom
ct/benchmarks-v3

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

This is a branch that I'll maintain on the side.

@connortsui20 connortsui20 added the changelog/skip Do not list PR in the changelog label Apr 26, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Apr 26, 2026

Merging this PR will not alter performance

✅ 1163 untouched benchmarks


Comparing ct/benchmarks-v3 (15cab9b) with develop (d763ece)

Open in CodSpeed

@connortsui20 connortsui20 force-pushed the ct/benchmarks-v3 branch 2 times, most recently from 1b6187a to aa87229 Compare April 26, 2026 18:56
connortsui20 and others added 9 commits April 27, 2026 09:27
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…pt (#7638)

## Summary

Implements the alpha **emitter** component for `bench.vortex.dev` v3,
per

[`benchmarks-website/planning/components/emitter.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/components/emitter.md).

**Purely additive** to v2's emission path — the existing `-d gh-json -o
...`
form is untouched.

### Rust emitter (`vortex-bench`)

- New `vortex-bench/src/v3.rs` module with one record type per `kind`
  (`query_measurement`, `compression_time`, `compression_size`,
  `random_access_time`, `vector_search_run`) plus serde-tagged
  `V3Record` enum. Field shapes match

[`02-contracts.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/02-contracts.md);
  dataset/variant/scale-factor mapping follows

[`benchmark-mapping.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/benchmark-mapping.md).
- Each benchmark binary gains a `--gh-json-v3 <PATH>` flag that writes
  bare records as JSONL (no envelope), alongside the legacy
  `--display-format gh-json -o ...` flow:
  - `compress-bench` — `compression_time` (encode/decode) +
    `compression_size`. Cross-format ratios are **not** emitted; ratios
    are computed read-side per `decisions.md`.
  - `datafusion-bench`, `duckdb-bench`, `lance-bench` —
    `query_measurement`, with optional memory fields populated when
    `--track-memory` is on. `QueryMeasurement` and the paired
`MemoryMeasurement` collapse into one record
(`SqlBenchmarkRunner::v3_records`).
  - `random-access-bench` — `random_access_time`, with the dataset name
    plumbed alongside `TimingMeasurement`.
- `vector-search-bench` — `vector_search_run`, with `dataset`, `layout`,
`threshold`, `iterations` plumbed in (they don't live on `ScanTiming`).
- `insta` snapshot tests cover one record per `kind`, scrubbing
  `commit_sha` and `env_triple`.

### Post-ingest script

`scripts/post-ingest.py` (Python 3, stdlib only — `urllib`, `json`,
`subprocess`):
- reads JSONL of records,
- fills the `commit` envelope from `git show` for the SHA passed in,
- wraps in `{run_meta, commit, records}` per the contract,
- POSTs to `<server>/api/ingest` with `Authorization: Bearer ...` from
  `INGEST_BEARER_TOKEN`,
- exits non-zero on 4xx/5xx. **No retries, no spool, no S3 outbox** —
  deferred per the alpha plan.

### Out of scope (deferred)

CI workflow integration, dual-write, `bench-orchestrator` updates,
retry/spool/outbox, replacing the v2 CLI form. All listed in

[`deferred.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/deferred.md).

## Test plan

- [x] `cargo test -p vortex-bench --lib` — 48 passed (7 new `v3` tests,
one snapshot per kind plus a JSONL round-trip).
- [x] `cargo build -p vortex-bench -p compress-bench -p datafusion-bench
-p duckdb-bench -p lance-bench -p random-access-bench -p
vector-search-bench` — all clean.
- [x] `cargo clippy --all-targets` on changed crates (skipping
`duckdb-bench`, blocked by an unrelated pre-existing
`cognitive_complexity` lint in `vortex-duckdb` on `ct/benchmarks-v3`).
- [x] `cargo +nightly fmt --all`.
- [x] End-to-end smoke: `scripts/post-ingest.py` against a Python
`http.server` mock — 200 → exit 0 with `{"inserted":1,"updated":0}`; 400
→ exit 1 with the server body on stderr.
- [ ] Real round-trip against an actual alpha server — blocked on the
server component landing (acceptance criterion 3 in the emitter plan;
verifiable once the server PR exists).

https://claude.ai/code/session_017qh4ju4FtkizW6s67JEhPW

---
_Generated by [Claude
Code](https://claude.ai/code/session_017qh4ju4FtkizW6s67JEhPW)_

---------

Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
…7637)

## Summary

Implements the alpha server for `bench.vortex.dev` v3 per

[`benchmarks-website/planning/components/server.md`](../tree/ct/benchmarks-v3/benchmarks-website/planning/components/server.md).

A single Rust binary that owns a DuckDB file on local disk, accepts
authenticated `/api/ingest` POSTs, and serves a small read API plus a
placeholder HTML route the web-ui PR will replace.

- **Schema** (`src/schema.rs`): `commits` dim + the five fact tables
from
`01-schema.md`. DDL is applied on boot; no migration framework at alpha.
- **Ingest** (`src/ingest.rs`): bearer-auth middleware, all-or-nothing
transactions, idempotent upsert via per-table xxhash64 `measurement_id`,
  full HTTP matrix from `02-contracts.md` (200 / 400 / 401 / 409 / 500).
- **Read API** (`src/api.rs`): `/api/groups`, `/api/chart/:slug`,
`/health`.
  Slugs are opaque base64url-encoded JSON (`src/slug.rs`) so the web-ui
  treats them as strings per the contract.
- **Records** (`src/records.rs`): per-`kind` discriminated union with
`deny_unknown_fields`, so unknown `kind`s and unknown fields fail
loudly.
- **HTML** (`src/html.rs`): placeholder root route - replaced by web-ui.

## Stack

Pinned in `benchmarks-website/server/Cargo.toml`:

- `axum = "=0.7.9"` (`http1`, `json`, `tokio`, `query`)
- `maud = "=0.26.0"` with `axum`
- `duckdb = "=1.4.1"` with `bundled`
- `tower-http = "=0.6.8"` for tracing
- `subtle = "=2.6.1"` for constant-time bearer compare
- `twox-hash = "=2.1.0"` for the `measurement_id` xxhash64
- workspace `anyhow` + `thiserror` for errors

The crate is a leaf binary outside the `vortex-*` public-API surface, so
`./scripts/public-api.sh` is intentionally skipped per the task brief.

## Routes

| Method | Path | Auth |
|---|---|---|
| `POST` | `/api/ingest` | bearer |
| `GET`  | `/api/groups` | none |
| `GET`  | `/api/chart/:slug` | none |
| `GET`  | `/health` | none |
| `GET`  | `/` | none (placeholder, web-ui replaces) |

## Test plan

- [x] `cargo build -p vortex-bench-server`
- [x] `cargo test -p vortex-bench-server` - 14 tests pass (4 unit + 10
integration)
- [x] `cargo clippy -p vortex-bench-server --all-targets -- -D warnings`
- [x] `cargo fmt -p vortex-bench-server`
- [x] Manual `cargo run` smoke: `/health`, `POST /api/ingest` (with and
without
      bearer), `/api/groups`, `/api/chart/:slug` round-trip.

Acceptance criteria from `components/server.md`:

- [x] `cargo build` succeeds for the server crate.
- [x] Integration test: POST with valid bearer → 200; re-POST → 200 with
`updated > 0, inserted = 0`; no/wrong bearer → 401; unknown `kind` →
400.
- [x] `GET /health` returns coherent shape after an ingest (db_path,
      schema_version, latest_commit_timestamp, per-table row counts).
- [x] `cargo run` against a fresh DuckDB file serves both read routes.

## Coordination

The skeleton commit (`3266b87`) was pushed before the integration test
commit so the web-ui agent can rebase onto the workspace member without
waiting for tests.

Branch: `claude/benchmarks-v3-server` → `ct/benchmarks-v3` (not develop,
not main).

---
_Generated by [Claude
Code](https://claude.ai/code/session_01MPMnGUzXCUQvdkwbhSU9HR)_

---------

Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Adds --gh-json-v3 plumbing through vx-bench and post-ingest steps
in bench.yml, sql-benchmarks.yml, plus a v3-commit-metadata workflow.
All v3 ingest is gated on vars.V3_INGEST_URL and continue-on-error,
so it's a clean no-op until the deploy track sets the variable.
v2's cat-s3.sh path is unchanged.

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
## Summary

Implements the alpha web UI for `bench.vortex.dev` v3 per

[`benchmarks-website/planning/components/web-ui.md`](../tree/claude/vortex-benchmarks-ui-v3-QxRCK/benchmarks-website/planning/components/web-ui.md).
Replaces the placeholder `html.rs` router introduced in #7637 with two
real pages backed by Maud templates and a vendored Chart.js bundle.

- `GET /` — landing page that lists every group + chart link from
  `/api/groups`, rendered via `maud`.
- `GET /chart/{slug}` — single Chart.js line chart. Payload is fetched
  server-side via the same `api::collect_chart` helper used by
  `/api/chart/:slug`, then embedded inline as a JSON
  `<script id="chart-data">` block. No client-side round-trip after
  page load.
- `GET /static/...` — vendored `chart.umd.js` (Chart.js 4.4.4, MIT),
  `chart-init.js`, and `style.css`. All bundled into the binary via
  `include_bytes!`.

Slugs are treated as opaque per

[`02-contracts.md`](../tree/claude/vortex-benchmarks-ui-v3-QxRCK/benchmarks-website/planning/02-contracts.md):
the chart handler echoes whatever `/api/groups` returned straight into
`ChartKey::from_slug` without parsing or constructing them itself.

`api::collect_groups` and `api::collect_chart` are now `pub(crate)` so
the HTML handlers reuse the same row collectors that back the JSON
read routes — no second SQL implementation.

The chart-init script and the embedded JSON payload between them
satisfy the "no network round-trip after page load" criterion. Inside
the JSON `<script>` block, `</`, `<!--`, and `<script` are escaped via
JSON-safe string escapes so that benign payload contents can never
break out of the script element.

## Tests

`tests/web_ui.rs` (new, 6 tests):

- `landing_page_snapshot` — `insta` snapshot of `GET /` after seeding
  three envelopes with distinct `commit.sha` / `commit.timestamp`
  values.
- `chart_page_snapshot` — `insta` snapshot of the rendered tpch-Q1
  chart page; exercises multi-series rendering
  (`datafusion:vortex-file-compressed` + `duckdb:parquet`) and verifies
  both the inline `<script id="chart-data">` block and the
  `/static/chart.umd.js` reference.
- `chart_page_round_trips_every_slug` — every slug returned by
  `/api/groups` resolves to a 200 chart page with inline data.
- `unknown_slug_renders_404` — bogus slug → 404 HTML page.
- `empty_landing_page_renders` — empty DB → "No data ingested yet."
- `static_assets_are_served` — content-type checks for the three
  `/static/*` files.

Pre-existing `tests/ingest.rs` still passes (10 tests).

## Stack inheritance

Inherits the version pins set by #7637 in
`benchmarks-website/server/Cargo.toml`. The only Cargo change is
`insta = { workspace = true }` under `[dev-dependencies]`.

## Verified locally

- `cargo build -p vortex-bench-server`
- `cargo test -p vortex-bench-server` — 10 ingest + 6 web-ui tests
  pass.
- `cargo +nightly fmt -p vortex-bench-server -- --check` — clean.
- `cargo clippy -p vortex-bench-server --all-targets` — clean.
- End-to-end smoke test against a running server:
`INGEST_BEARER_TOKEN=test`
  + `cargo run`, POST two envelopes with different commit shas,
  verified `/`, `/chart/{slug}`, the three `/static/*` routes, and the
  invalid-slug 404 path with `curl`.

## Test plan

- [ ] Reviewer runs `cargo test -p vortex-bench-server` locally.
- [ ] Reviewer starts the server (`INGEST_BEARER_TOKEN=test cargo run -p
vortex-bench-server`), POSTs
`benchmarks-website/server/fixtures/envelope.json`,
      and visits `http://127.0.0.1:3000/` in a real browser to
      confirm the chart hydrates (this PR was developed in a
      headless sandbox so visual verification was not possible
      here).
- [ ] CI green.

## Out of scope (deferred per `web-ui.md` + `deferred.md`)

Per-commit page, filter UI, full-screen modal, deep links, LTTB
downsampling, lookup-table-driven engine names / colours,
chartjs-plugin-zoom, ratio rendering on compression-size charts, and
geomean summary cards are explicitly deferred and not touched here.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---
_Generated by [Claude
Code](https://claude.ai/code/session_01UjgnLq5MCmcpyv6PXC5oLv)_

---------

Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
… hash (#7642)

Without commit_sha in the hash input, every (dim tuple) collapses to one
row across commits via INSERT ... ON CONFLICT DO UPDATE, so the chart
pages render at most one point per series. Adding commit_sha to the
per-table hashers makes each (commit, dim) pair its own row, which is
the time series the UI is built around. Re-emission of the same (commit,
dim) is still the upsert case.

The web-ui chart_page_query snapshot now correctly shows three commits
with three points per series, matching the test fixture.

No public API change; measurement_id is server-internal.

<!--
Thank you for submitting a pull request! We appreciate your time and
effort.

Please make sure to provide enough information so that we can review
your pull
request. The Summary and Testing sections below contain guidance on what
to
include.
-->

## Summary

<!--
If this PR is related to a tracked effort, please link to the relevant
issue
here (e.g., `Closes: #123`). Otherwise, feel free to ignore / delete
this.

In this section, please:

1. Explain the rationale for this change.
2. Summarize the changes included in this PR.

A general rule of thumb is that larger PRs should have larger summaries.
If
there are a lot of changes, please help us review the code by explaining
what
was changed and why.

If there is an issue or discussion attached, there is no need to
duplicate all
the details, but clarity is always preferred over brevity.
-->

Closes: #000

<!--
## API Changes

Uncomment this section if there are any user-facing changes.

Consider whether the change affects users in one of the following ways:

1. Breaks public APIs in some way.
2. Changes the underlying behavior of one of the engine integrations.
3. Should some documentation be updated to reflect this change?

If a public API is changed in a breaking manner, make sure to add the
appropriate label. You can run `./scripts/public-api.sh` locally to see
if there
are any public API changes (and this also runs in our CI).
-->

## Testing

<!--
Please describe how this change was tested. Here are some common
categories for
testing in Vortex:

1. Verifying existing behavior is maintained.
2. Verifying new behavior and functionality works correctly.
3. Serialization compatibility (backwards and forwards) should be
maintained or
   explicitly broken.
-->

Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
This PR introduces the deployment infrastructure for vortex-bench-server
v3, a new benchmarking server that runs alongside the existing v2
instance. The v3 server provides an ingest endpoint for benchmark
results with bearer token authentication and uses DuckDB for data
storage.

1. **GitHub Actions workflow** (`publish-bench-server.yml`): New CI
pipeline that builds and publishes the vortex-bench-server Docker image
to GHCR on changes to the server code, vortex-bench crate, or
Cargo.lock.

2. **Dockerfile** (`benchmarks-website/server/Dockerfile`): Multi-stage
Docker build that:
   - Compiles vortex-bench-server in a Rust 1.91 environment
   - Packages it with DuckDB CLI tools in a minimal Debian image
   - Targets ARM64 architecture for EC2 deployment

3. **Backup script** (`benchmarks-website/server/scripts/backup.sh`):
Daily backup utility that:
   - Exports the DuckDB database from the running container
   - Uploads backups to S3 (`vortex-ci-benchmark-results/v3-backups/`)
   - Manages local disk space by retaining only the latest backup

4. **Docker Compose configuration**: Added vortex-bench-server service
that:
   - Runs on port 3001 (v2 remains on port 80)
   - Mounts EBS-backed data directory for DuckDB persistence
   - Loads bearer token from `/etc/vortex-bench/secrets.env`
   - Integrates with existing watchtower for automatic image updates

5. **EC2 initialization guide** (`ec2-init.txt`): Comprehensive setup
documentation covering:
   - Bearer token secret management
   - EBS volume preparation
   - Service startup and health checks
   - Cron-based backup scheduling
   - Token rotation procedures

The v3 server is designed to run additively alongside v2, allowing for
gradual DNS migration and dual-write support from CI.

The Docker image build is validated by the GitHub Actions workflow on
each push to develop. The backup script can be tested manually on the
EC2 host before cron scheduling. Smoke tests are documented in the setup
guide (curl against `/health` endpoint on port 3001).

https://claude.ai/code/session_019mBcBdF4LhKDXyKwuKRAPV

---------

Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
This is a one-shot migration binary to take all of the data from
`data.json.gz` and bring it into a duckdb database.

Simply gathers and aggregates everything into memory and writes data in
chunks with arrow arrays. Insert row-by-row took way too long, and the
appender API in duckdb does not support `BIGINT[]` for some reason...

---------

Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/skip Do not list PR in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant